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Abstract: The authors consider the problem of estimating the density g of independent and identically 
distributed variables Xi, from a sample Zi, . . . , Z n where Zi = Xi + ae%, i — 1, . . . ,n, e is a noise 
independent of X, with ae having known distribution. They present a model selection procedure allowing 
to construct an adaptive estimator of g and to find non-asymptotic bounds for its L2(R)-risk. The estimator 
achieves the minimax rate of convergence, in most cases where lowers bounds are available. A simulation 
study gives an illustration of the good practical performances of the method. 

Deconvolution adaptative de densite par contraste penalise. 

Resume : Les auteurs considerent le probleme de deconvolution c'est-a-dire de l'estimation de la densite 
de variables aleatoires identiquement distributes Xi, a partir de l'observation de Zi ou Zi — Xi + aei, 
pour i = 1, . . . , n, oil les erreurs azi sont de densite connue. Par une procedure de selection de modeles 
qui permet d'obtenir des bornes de risque non asymptotiques, ils construisent un estimateur adaptatif de 
la densite des Xi. L'estimateur atteint de facon automatique la vitesse minimax dans la plupart des cas, 
que les erreurs ou la densite a estimer soient peu ou tres regulieres. Une etude par simulation illustre les 
bonnes performances pratiques de la methode. 



1. INTRODUCTION 

We observe Z\, • • • , Z n , n independent and identically distributed (i.i.d.) copies of Z in the model 

Z = X + ae, 

where X and e are independent random variables, with unknown density g for X, known density 
f e for e, and known noise level a. In this model, we aim at estimating the density g without any 
prior knowledge on its smoothness, using the observations Z\, ■ ■ ■ , Z n and the knowledge of the 
convolution kernel o~f e (-/a). The parametrer a is only estimable under more restrictive conditions 
on g, such as a lower bound on its Fourier transform. However, under the usual conditions on g (as 
in the currrent paper), a has to be known. We refer to Butucea and Matias (2005) for the problem 
of the estimation of a as well as for results about density deconvolution when a is unknown in such 
a model. 

In density deconvolution, two factors determine the estimation accuracy. First, the smoothness 
of the density to be estimated, g, and second the smoothness of the errors density, the worst rates 
of convergence being obtained for the smoothest errors density. Indeed, due to the independence 
of X and e, the density h of Z is h(-) — g * (a f £ {- / a)) , where * denotes the convolution product, 
and if f s is very smooth then so is h, the density of the observations and thus it is difficult to 
recover g. 
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In this context, we consider two classes of errors: first the so called ordinary smooth errors 
with polynomial decay of their Fourier transform and second, the supersmooth errors with Fourier 
transform having an exponential decay. 

Most previous results concern kernel estimators and densities g to be estimated belonging to 
Holder or Sobolev classes with known order s. One can cite among others Carroll and Hall (1988), 
Devroye (1989), Fan (1991a, b), Liu and Taylor (1989), Masry (1991), Stefanski and Carroll (1990), 
Zhang (1990), Koo (1999), Cator (2001). 

Smoother densities g with exponential decrease of their Fourier transform, have been first con- 
sidered by Pensky and Vidakovic (1999), Butucea (2004) and Butucea and Tsybakov (2004). The 
latter study the sharp optimality (in a minimax sense) by using non adaptive kernel estimators 
and provide an adaptive estimator in some special case. The former is the first paper dealing with 
adaptivity in a general context. This first adaptive estimator is a wavelet estimator, that achieves 
the minimax rates when g belongs to some Sobolev class, but that fails in reaching the minimax 
rates when both the errors density and g are super smooth. Let us mention also Pensky (2002) 
for the estimation of irregular functions and Fan and Koo (2002) who consider wavelet estimators 
for densities belonging to Besov spaces. Lastly, analogously to Hesse (1999), Delaigle and Gij- 
bels (2004a, b) study adaptive methods using cross validation and bootstrap methods in the kernel 
context. 

In the spirit of Barron et al. (1999), we build an adaptive estimator g, constructed by model 
selection, and more precisely by minimization of a penalized contrast function. We show that g 
is adaptive in the sense that its construction does not require any prior smoothness knowledge 
on g and that its rate of convergence is the minimax rate of convergence (up to some logarithmic 
factor) in all cases where lower bounds are previously known, that is in most cases. More precisely, 
we establish non-asymptotic bounds for its integrated quadratic risk that ensure an automatic 
trade-off between a bias term and a penalty term, only depending on the observations and on 

<Tfe(-/<T). 

The estimator automatically achieves the best rate obtained by the collection of non-penalized 
estimators when the (unknown) optimal space is selected, exactly or sometimes within a negligible 
logarithmic factor. In all cases where lower bounds are available, this best rate is the minimax 
rate of convergence. In particular, when both the density and the errors are super smooth (5 > 
and r > in (A|) and (R^) below), our adaptive estimator significantly improves the rates given 
by the adaptive estimator built in Pensky and Vidakovic (1999) whereas both adaptive estimators 
have the same rate in the other cases (see Section 4.3). 

The paper is organized as follows. In Section 2, we present the assumptions and the estimators. 
In Section 3 we give upper bounds for the L2(R)-risk of the estimator, when the smoothness of 
g is known, and study the optimality in a minimax sense of the resulting rates. In Section 4, we 
give upper bounds of the L2 (R)-risk of the penalized minimum contrast estimator g when no prior 
knowledge on the smoothness of g is used. The theoretical results are illustrated by a simulation 
study in Section 5, and all the proofs are gathered in Section 6. 

2. CONSTRUCTION OF THE ESTIMATORS 

For u and v in L2QR), u* denotes the Fourier transform of u, u*(x) — J e ttx u(t)dt, u * v is the 
convolution product, u*v(x) = J u(t)v(x — t)dt, || u || = (/ \u\ 2 (x)dx) , and (s,t) = J s(x)t(x)dx. 

2.1 Model and Assumptions 

We require that f e belongs to L 2 (R) and that for all f*(x) 7^ 0. We consider that: 

(Af' £ ): The sequences (e»)»£N and (JQ) i£N are sequences of independent random variables. 
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The smoothness of f £ is described by the following assumption. 

(A^): There exist nonnegative numbers K ,KQ,-f, fj,, and 5 such that /* satisfies 
K (x 2 + l)-^ 2 exp{-^\x\ s }< |/*(z)| <k' (x 2 + l)-^ 2 cxp{- l i\x\ s } 

Only the left-hand side of (A|) is required for upper bounds whereas the right-hand side is 
useful when we consider lower bounds and optimality, in a minimax sense, of our estimators. 

When S = in (A|), the errors are usually called "ordinary smooth" errors, and they are called 
"super smooth" when \x > and S > 0. Indeed densities satisfying (A|) with 5 > and \i > 
are infinitely diffcrcntiable. The standard examples for super smooth densities are the following: 
Gaussian or Cauchy distributions are super smooth of order 7 = 0, S = 2 and 7 = 0, S = 1 
respectively. For ordinary smooth densities, one can cite for instance the double exponential (also 
called Laplace) distribution with 6 = = fx and 7 = 2. Although densities with 6 > 2 exist, 
they are difficult to express in a closed form. Nevertheless, our results hold for such densities. 
Furthermore, the square integrability of f e and (A|) require that 7 > 1/2 when 6 = 0. 

By convention, we set /i — when 6 = and we assume that fi > when 6 > 0. In the same 
way, if <7 = 0, the Xj's are directly observed without noise and we set fi = 7 = 5 = in this case. 

Although, slower rates of convergence for estimating g are obtained for smoother error density, 
those rates can be improved by some additional regularity conditions on g. Those regularity 
conditions are described as follows. 

There exists some positive real numbers s,r,b such that g belongs to 

/+00 
\V{x)\ 2 {x 2 + l) s exp{26|xr}dx < Ci} 
-00 

There exists d > such that Va; e E, |.g*(a;)| < 1I[_ d ^(x). 

The smoothness classes described by (ftf) are classically considered both in deconvolution and 
in "direct" density estimation, with <S Sj o,6(Ci) known as Sobolcv classes. The densities satisfying 
(R-i^) with r > 0,6 > are infinitely many times differentiable, admit analytic continuation on a 
finite width strip when r = 1 and on the whole complex plane if r = 2. The densities satisfying 
(Rf ), often called entire functions, admit analytic continuation in the whole complex plane (see 
Ibragimov and Hasminskii (1983)). 

Subsequently, the density g is supposed to satisfy the following assumption. 

(Af) : The density g e L 2 (K) and there exists M 2 > 0, such that j x 2 g 2 (x)dx < M 2 < +00. 

Assumption (A^-) which is due to the construction of the estimator, is quite unusual in density 
estimation. Nevertheless it already appears in density deconvolution in a slightly different way 
in Pcnsky and Vidakovic (1999) who assume, instead of (A^) that sup^gg |x|g(x) < 00. It is 
important to note that Assumption (A^) is very unrestrictive. 

All densities having tails of order tends to infinity satisfy (A^) only if m > 1/2. 

One can cite for instance the Cauchy distribution or all stable distributions with exponent r > 1/2 
(see Devroye (1986)). But, the Levy distribution, with exponent r = 1/2 does not satisfies (A^). 

2.2 The projection spaces 

Consider ip(x) = sin(7rx)/(7rx), and let (p m ,j(x) = ^L m ip(L m x — j), m G A4 n = {1, • • • , m n }. It 
is well known (see for instance Meyer (1990), p. 22) that {ip m .j}j & z, is an orthonormal basis of the 
space of square integrable functions having a Fourier transform with compact support included 



(Rf): 
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into [—irL m ,irL m ]. We denote by S m such a space and by (S m ) m eM n this collection of linear 
spaces. In other words 

S m = {y^ j am,jV> m , i , a m ,j £R} = {/ G L 2 (K), with supp(/*) included into [-L m n, L m n]}. 

jez 

When L m = 2 m , the basis {<p m ,j} is known as the Shannon basis, but we consider here that 
L m = m. 

In this context, since g m = J2jez a "i.j ( Pm,j with a m j =< g, (p m j >, the orthogonal projection 
of a on S m , involves infinite sums, we also consider the truncated spaces S m ^ defined as 

Sffl = \ ^2 a m,j<Pm,j, a m ,j e K > where K n is an integer. 
[\j\<K n J 

It is easy to see that, { l Pm.j}\j\<K n is an orthonormal basis of Sm and the orthogonal projection 
g$ of g on is given by g$ = J2\j\<K n a m,jVm,j with a m ,j =< 9, <Pm,j >■ 

Associate this collection of models to the following contrast function, for t belonging to Sm ^ 



1 n 1 //*( 

7™W = -E[ll^l| 2 -2<(^)], with Ut (x) = -(jj 



{ax) 



1 (t*(-x) 

W = ' 

By using Parseval and inverse Fourier formulas we get that 

and hence E(7„(i)) = ||i — g|| 2 — ||g|| 2 which is minimal when t = g. This shows that -) n {t) suits 
well for the estimation of g. 

2.3 Construction of the minimum contrast estimators 

Associated to the collection of models, the collection of non-penalized estimators jm' of g is defined 

by 

9 { m ] = arg min 7n (i). (1) 

By using that, t i— » u t is linear, and that { t p m .j}\j\<K n is an orthonormal basis of Sm\ we have 
5m ) = H\j\<K n am,j<Pm,j where a mj - = n^ 1 Yh=i u * Vm . 3 ( z i) and =< 5, >= a md . 

2.4 Construction of the minimum penalized contrast estimator 

We aim at finding the best model m in M. n , based on the data and not on prior knowledge on 
the smoothness of g, such that the risk of the resulting estimator is almost as good as the risk of 
the best estimator in the family. The model selection is performed in an automatic way, using the 
following penalized criteria 

9 = 9^ with m = arg min 7„(<^™ ) ) + pcn(m) , (2) 

mEM n L J 

where the penalty function pen(m) is defined by 

r max(0,min(35/2-l/2,6)) r / n 

pen(m) - 2a(X 1 + ^ s tt 5 X 2 )^ t^l, (3) 

n 
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the constant a is a fixed universal constant (to be found by simulation experiments), 

(0-2^2 _|_ ij-y 

Ai(7, K o,M,M) = g 2Wt — 1 — T'- R (/ / ' 5 ' fT ) = n {<5=o} + 2m<5ct 1I {0<(5 < 1} + 2/xct U {(5> i } , (4) 
A 2 = A} /2 (1 + a 2 ^r/ 2 ||/ £ || K - 1 (27r)- 1 /2 1 i {1/ 3< 5 < 1} + A 1 il{ <5>1 }, 

and r(m) = L^ 1 "* 5 exp {2(ia S TT S L s m } . (5) 

Since er and / e are known, the constants a and /i, 5, kq, 7 defined in (A|) are also known. 

3. RATES OF CONVERGENCE OF THE MINIMUM CONTRAST ESTIMATORS 

3. 1 Bias-variance decomposition of risk of gffl 

Let us first study the rate of convergence of one estimator §m ' , when the smoothness of g is known. 

Proposition 1. Under Assumption (A*), denote by Ai(m) = £ m |/*(L m a;cr)p 2 dx/(2n). 
Then E\\g - g$f < \\g - g m \\ 2 + (nL m ) 2 (M 2 + l)/K n + 2A 1 (m)/n. 

Remark 1. We point out that the {<p m ,j} are K-supported (and not compactly supported) so that 
we obtain an estimation on R and not only on a compact set as for usual projection estimators. This 
is a great advantage of this basis. Nevertheless it induces the residual term (■KL m ) 2 (M 2 + 1)/K n , 
due to the truncation \j\ < K n . But the most important thing is that the choice of K n does not 
influence the other terms. Consequently, it is easy to check that we can find a relevant choice of 
K n (K n > n under (A3 ), that makes this last supplementary term unconditionally negligible with 
respect to the others. The choice of large K n does not change the efficiency of our estimator from 
a statistical point of view but only changes some practical computations. 

Let us comment the three terms in the bound of the risk. The variance term Ai (m)/n depends 
on the rate of decay of the Fourier transform of f e , with larger variance for smoother f e . Under 
(A|), by applying Lemma 3 in Section 6.3, we get that Ai(m) < 2A]T(m) where T(m) is given 
by (J5J and Aj = Ai(7, kq, ji, cr, 6) is given by Q. In order to ensure that T(m n )/n is bounded, we 
only consider L m = m < m n with 



m n < 



1/5 



IT- 1 



ln(n) 2j+l-S, An(n) 
In 1 



if 5 > 0. 



(6) 



2/j.a 5 28 ficr 6 "* \2fia 5 

Under (A3 ) and (A|), if K n > n, then we have 

n\9-9ln l) \\ 2 < \\g-g m \\ 2 + 2\ 1 T(m)/n+(7rL rn ) 2 (M 2 + l)/n (7) 

Finally, since g m is the orthogonal projection of g on S m , we get that g* m = g*H[_£ m7r L m ir] an d 
therefore || 5 -<? m || 2 = (2^)- 1 ||. 9 * - 9 ;J 2 = ^tt)" 1 \g*\ 2 (x)dx. 

3.2 Order of the risk of g^m 1 under regularity assumptions on g 

Under (R 2 ) and (A^), by choosing irL m = d, and K n > n, the bias term j| g — g m \\ 2 = 0, the 
bound becomes E(||p - g$f) < 2\ 1 d^ +l ~^ exp {2fxa 5 Tr 5 d 5 } /n + d 2 (M 2 + l)/(7r 2 n), and 
the density g is estimated with the parametric rate of convergence. We refer to Ibragimov and 
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Hasminskii (1983) for similar result on the "direct" estimation of a density g satisfying Assumption 
(R* ) , using the observations X\ , ■ 



If now g satisfies (Ri ), \\g — g n 



under (A^) with K n > n, the risk of gm is bounded by 



< [C*i/(27r)](L I 2 „ 7 r 2 + l)- s exp{-2&7r r I4}. According to ©, 



Ci(2 7 r)- 1 (i 2 ,^ 2 + l)- s c^p{-2bTT r L r m } + 2X 1 L^ +1 - S ^ exp /n + {irL m ) 2 {M 2 + l)/n. 

The optimal choices of L m and the resulting rates are given in Table 1, for different types of 
smoothness of the unknown density g and different types of known error density f e . 

Table 1: Optimal choice of the length (L„) and resulting (optimal) rates under Assumptions (A|) 
and (Rf ). 



5 = 
ordinary smooth 



5 > 
supersmooth 



r = 
Sobolev(s) 



7rL A = 0(n 1 /( 2s +^+ 1 ') 
rate = 0(n- 23 /( 23+ ^ +1 >) 
minimax rate 



irLfn = [ln(n)/(2 M a 4 + l)] 1/s 
rate = 0({\n(n))- 2a/s ) 
minimax rate 



r > 



nL A = [ln(n)/2b] 1/r 

rate = O — ^ 

V n 
minimax rate 



Lm solution of 
= 0(n) 

minimax rate if r < 5 and s — 



Let us emphasize that the rate for r > 0, 5 > is not explicitly given, but is only written the 
solution Lrh of the equation 

i™ 2s+27+1 " r exp^^TT^) 5 + 2bn r L rh r } = 0(n). (8) 

The study of this case is of most importance since the case S > contains the most studied case 
of Gaussian errors. The association 5 > and r — leads usually people to conclude that this 
problem is without hope when S > since the rates, of logarithmic order, are indeed very slow in 
that case. But if we associate S > to r > 0, then much faster than logarithmic rates are recovered 
(see Section 3.4). The empirical experiments of Section 5 illustrate that the estimation algorithm 
works well in that case. Lastly, we can mention that, in the context of stochastic volatility models 
seen as processes observed with errors, most stationary distributions of standard diffusion models 
studied by Comte and Genon-Catalot (2005) happen to belong to this class. 

3.3 About the solution of Equation in the case r > 0, 5 > 

The special case r = S > leads to the explicit solution 

ttL i?1 = [ln(n/ln(n) a )/(2^cr' 5 + 2b)] 1/r with a = (2s + 2>y - r + l)/r (9) 

and to the rate [ln(n)] Q 'n~ a '/( a '+^' T ' 5) with a' = (-2s/icr <5 + (2j - r + l)b)/(r(pa s +b)). 

If r > 0, S > and r ^ S, the expression of optimal parameter L„, solution of the Equation 
||SJ|, has not one single form for general r > and S > 0. 
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- When < r < S, we can precise here the order of the rate by using some additional information 
on the ratio r/S < 1. We have to distinguish if r/S < 1/2 or 1/2 < r/S < 2/3, .... More precisely, 
if r/S < 1/2, the optimal choice is 



Trim = 



ln(n) _ _2b_ (ln(n)\ r/S 
2[ia & ~ 2(j,<t 5 \ 2f^ 



cln 



ln(n) 
2\io b 



1/5 



with C : 



27 - r + 2s + 1 



and the rate is of order 

ln(n)- 2s / 5 cxp[-2fe(ln(n)/(2 M a 5 )) r ' /5 ] 
If 1/2 < r/5 < 2/3 the optimal choice of nL^ is 



ln(n) 26 ^ln(n)y /<5 + r (26) 2 fln(n)^ 



2[ia & 2^(7 5 \2/ia s 



2r/«-l 



(5 2^cr< 5 V 2\xo b J 



iln 



/ ln(n) 
V2A*cr 5 



1/5 



with the same c as above, which gives the rate 



ln(n) 



-2s/S 



exp 



-26 



ln(n)y /5 (26) 2 r /ln(n) V 7 *" 1 



If 2/3 < r/S < 3/4, we have another choice of 7rL TO with another rate. 

- When < S < r, we can also precise the order of the rate of our estimator, by using, once 
again, some additional information on the ratio 6/r. For instance, if S/r < 1/2, the optimal choice 
Lm is 



ln(n) 2/i(T S /ln(n) 



26 



26 



26 



S/r 



c In 



ln(n) 
~2b~ 



l/r 



with C : 



27 - r + 2s + 1 
26^ 



and the rate is of order 

ln(n)^ +1 - 5 )/ r cxp[2 M( T 5 (ln(n)/(26)) d >]/n. 
As in the case < r < 5, we obtain a different rate for 1/2 < S/r < 2/3. 



(10) 



It follows that in the case r > and S > 0, the rate depends on the integer k such that r/S or 
<5/r belongs to the interval Ik =]k/(k + 1); (k + l)/(fc + 2)]. We are , to our knowledge, the first 
ones to have noticed this (unavoidable) particularity of the rates. 

3.4 About the optimality of gln^ when g belongs to S Sir ,b(Ci) 

The rates n -^/(2s+2 1+ i) (,5 = 0, r = 0), \n{n)- 2s / s (5 > 0, r = 0) and \n{n)^ +1 ^ r /n (5 = 0, r > 0) 
are known to be the minimax rates and we refer to Fan (1991) (first two cases) and to Bu- 
tucea (2004) (last case) for lower bounds. 

The optimality of the rates in the case S > 0, r > requires a specific discussion. 

To our knowledge, the first paper dealing with the case where g is super smooth (r > 0) is the 
paper by Pensky and Vidakovic (1999). See Section 4.3 for a discussion of the rates they obtain 
compared to ours. 

The case r = S = 1 is studied by Tsybakov (2000) and Cavalier et al. (2003), in the case of 
inverse problems with random noise. In this case and in both problems (density deconvolution 
and inverse problem) the best compromise is explicit and so is the rate of convergence, of order 
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n a'/W+v) [ mn ]( 2s l ie+2b' 1 )/( l ia+b) _ T ^ j g no t e worthy that jm' seems also to achieve the minimax 
rate of convergence in this case. 

When < r < S, some lower bounds are known in the special case < r < S and s = 0. 
According to Butucea and Tsybakov (2004), in this case, if we denote by nL m the solution of 
2/ia s (irL m ) s + 2b(irL m ) r = Inn — (In Inn) 2 , then the rate of convergence of g m is the minimax rate 
of order exp{— 2b(nL m ) r } . The rate of convergence is always of order a power of ln(n) multiplied 
by an exponential term, that is decreases faster that any logarithmic function, but slower than any 
power of n. 

When < 5 < r, no lower bounds are available. In this case, the rate is of order a power of 
ln(n) multiplied by a negative power of n and by an exponential term. 

3.5 Conclusion on the minimum contrast estimators gffl 

The estimator g m achieves the minimax rate in all cases where lower bounds are available 
but its construction requires the knowledge of the smoothness of g. All those facts give strong 
motivation to find some adaptive estimation procedure that does not require such prior smoothness 
knowledge on g, and whose risk automatically achieves the minimax rate. 

4. ADAPTIVE ESTIMATION 

4-1 Main result of adaptive estimation 

We look for a penalty function, based on the observations and on af £ (-/a), such that, for K n > n 
E||5- 5 || 2 < inf [|| g-g m || 2 + {nL m ) 2 (M 2 + l)/n + 2X 1 T(m)/n] . (11) 

The following theorem describes the cases where the oracle inequality is reached. 

Theorem 1. Under the assumptions (A|) and (A^), consider the collection of estimators gm^ 
defined by 0) with K n > n and 1 < m < m n satisfying if 5 < 1/3 and if S > 1/3, 



m n < 7r 1 



ln(n) 2 7 + 1 - 5 + min( (36/ 2 - 1/2), S) ^ /ln(n) 



1/5 



Let pen(m) be defined by 0) for some universal numerical constant a > 1. Then, g = gS?' defined 
by ^) satisfies 

E(\\g~g\\ 2 ) < C a inf [\\g - g m f + pcn(m) + {nL m ) 2 (M 2 + l)/n] + an a C/n, (12) 

m£{l,...,m n } 

where C a = max(K 2 , 2n a ), K a — (a + l)/(a — 1) and C is a constant depending on f E and a. 
Obviously, Remark 1 still holds for the adaptive estimator. 

The rates are easy to deduce from (|12ll as soon as g belongs to some smoothness class, but the 
procedure will reach the rate without requiring the knowledge of any smoothness parameter. 



4-2 About the optimality of the adaptive estimator g 
Rate of g under (R^) : no loss. 



If g satisfies (R^), then according to Section 3.2, \\g — g m \\ 2 = as soon as nL m > d, and the 
parametric rate of convergence is automatically achieved without the knowledge of C2 and d and 
especially without requiring to know that (R2 ) is fulfilled. 

Rate of g under (Rx )• 

Under (R^), the rate of convergence of g clearly depends on the order of the penalty compared 
to the variance order T(m)/n. If g satisfies (R^f), \\g — g m \\ 2 < (Ci/2n)L^ s exp{— 2bir r L r m }. For 
instance, if 5 = 0, by associating the order of the bias to the value of pen(m), of order T(m)/n, we 
obtain that the estimator g automatically reaches the minimax rate ln(n)( 27+1 '/ r /n., without the 
knowledge of s,r nor b. In all cases, g achieves the minimax rate up to some logarithmic factor. 

Rate of g under (Ri ), cases without loss. 

When < 8 < 1/3, the penalty function has the variance order T(m)/n, and g achieves the best 
rate of g^. Under (Rj ), this best rate is the minimax rate in all cases here, except if r > 8 > 
and 8 < 1/3 which is a case where no lower bounds are available. 

When 8 > 1/3, the penalty function pen(m) has not exactly the order of the variance T(m)/n, 
but a loss of order Lm 1 / 2 * ),(5 ' ) occurs, that is of order Lm S if 1/3 < 8 < 1 and of order 

L m if 8 > 1. Consequently g achieves the best rate of if the bias ||<? — g m \\ 2 is the dominating 
term in the trade-off between \\g — g m \\ 2 and pen(m). 

- When r = and 8 > 1/3, the minimax rate of order (\n(n))~ 2s / s is given by the bias term, 
and the loss in the penalty function does not change the rate achieved by the adaptive estimator 
tji, which remains thus the minimax rate. 

- When < r < 8, the rate is given by the bias term and thus this loss does not affect the rate 
of convergence of g either. Therefore, g achieves the best rate of c/rh, which is the minimax rate of 
convergence when s — and also probably if s 7^ 0. In the specific case < r < 8/2 and s = 0, 
Butucea and Tsybakov (2004) also propose an adaptive estimator. But this requires to know that 
< r < 8/2 and s = 0. 

Rate of g under (R^ ), case with loss. 

- When r > 8 > 1/3, pen(m) can be the dominating term in the trade-off between \\g — g m \\ 2 
and pen(m). This induces a loss of order L™ m ^ 3( ^ 2 1 / 2 ^' 5 ' j n the rate of convergence of g compared 
to the best rate of jjrh. Since it happens in cases where the order of the optimal L m is less than 
(lnn) 1//<5 , the loss in the rate is at most of order Inn, when the rate is faster than logarithmic and 
consequently, the loss appears only in cases where it can be seen as negligible. 

For L 2 estimation, such an unavoidable logarithmic loss in adaptation, has been pointed out by 
Tsybakov (2000) and Cavalier et al. (2003) in case of inverse problems with random noise, when 
r = 8 = 1, which shows, in a slightly different model but with comparable rates of convergence, 
that a loss due to adaptivity of order ln(n) f> ^ A " T+b ^ is unavoidable. The main point is that, accord- 
ing to (jSJ), our estimator has its quadratic risk with the same logarithmic loss when r = 8 = 1. 
This logarithmic loss due to adaptation seems thus unavoidable at least in one case. 

Remark 2. When a = 0, then by convention S = ^ = 0, Ai = 1 and pen(m) = 6aL m /n which is 
the penalty function used in direct density estimation. More precisely, if a is very small, then the 
procedure selects the parameter L rn closed to the parameter selected in usual density estimation. 

4-3 Comparison with Pensky and Vidakovic (1999) 

To our knowledge, the first paper dealing with adaptive density deconvolution is the paper by 
Pensky and Vidakovic (1999) who are also the first that consider the case of r > 0. The adaptive 
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estimators proposed in Pensky and Vidakovic (1999) achieve minimax rates of convergence in the 
three cases (S = 0,r = 0), (S = 0,r > 0), and (8 > 0,r = 0). 

But when (r > 0, S > 0), the rate of convergence of their estimator is not minimax. This is 
shown in the special case < r < S and s = 0, in Butucea and Tsybakov (2004), where sharp 
minimax results are stated. This is also shown by our results when < 6 < r and when < r < 5, 
s^0 (see Sections 3.4 and 4.2). For instance, when < S/r < 1/2, according to ifTUjl and Sections 
3.3 and 4.2, the resulting rate of g is of order 

ln(n) max(0 < min(3A " /2 - 1/2 <^ 

stricly faster than the upper bound of the rate in Pensky and Vidakovic (1999) (see their Theorem 
4) which is of order ln{n)^ +1 ^ s /„a/(a+2 M <t 4 ( 4w / 3 ) S ) for A > 0. 

The non-optimality of their adaptive estimator when (<5 > 0, r > 0) comes from two facts. 
First, when (5 > 0, r > 0), they choose a smoothing parameter (analogous to L 7 n) as in the case 
(r = 0, S > 0). Consequently, it provides an adaptive estimator in the sense that it does not depend 
on the smoothness parameters of g. But it does not give the best rate for their estimator, since it 
does not correspond to the best choice in their bias-variance compromise. 

Second, this non optimality of their estimator when 5 > 0, r > 0, comes also, in a more 
crucial manner, from the fact that their wavelet and scaling functions cannot provide the optimal 
bias-variance decomposition. This is due to the support of the Fourier transform of their scaling 
function as well as their wavelet which induce, when 6 > 0, r > 0, a squared bias term of order 
L m 2s exp{-2b(2ir/3) r L r rn } with a variance term of order L% +1 ~ s eKp{2n<j s (4n/3) s L s rn }. When ei- 
ther (8 = 0, r = 0), (6 > 0, r > 0) or (6 > 0, r — 0), those supports have no influence on the rate of 
convergence, and hence their estimator is minimax. But these supports do not allow to reach the 
minimax rate when (5 > 0, r > 0). 

The asymptotic properties of g are improved by using the basis generated by sin(7rcc)/(7ra:). 
Indeed, due to its Fourier transform, it implies a squared bias of order L~ 2s exp{— 2bTr r L^ n } and 
a variance of order L 2 2 +1 ~ s exp{2/icr' 5 7r' 5 L* i } and hence a better trade-off between the two terms. 
Section 3.3 as well as Butucea and Tsybakov (2004)'s results illustrate that the best choice of L^,, 
solution of the bias- variance compromise (see equation iJEJO) requires quite precise computations. 
Besides its simplicity, this basis seems thus the most relevant since it gives the minimax rates 
in all the cases where lower bounds are available and faster rates than the ones in Pensky and 
Vidakovic (1999) in the remainder case. 

5. SIMULATION STUDY 

The implementation is conducted by using Matlab software. Details about the algorithm can 
obtained from the authors upon request. We choose K n = 2 8 as being of order 0{n) is all cases. 

The integrated squared error ISE(g,^ l ' ) ) = \\g^ — g\\ 2 is computed via a standard approximation 
and discretization of the integral on an interval of K denoted by / and given in each case. 

Then the MISE, MISE^) = E||gf™ ) - g\\ 2 is computed as the empirical mean of the ap- 
proximated ISE ||<7m^ — g|| 2 , over 500 simulation samples. We illustrate our method on some test 
densities, with various smoothness properties, and for the two types of errors, ordinary and super 
smooth. We start by describing the error densities and the associated penalties. 

5. 1 Two settings for the errors and the associated penalties 

We consider two types of error density f E , the first one is ordinary smooth, with polynomial decay 
of the Fourier Transform, and the second one is supersmooth, with an exponential decay of the 
Fourier transform /*. 
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• Case 1: Laplace (or Double exponential) e's. In this case, f e (x) = e ^l x l/\/2, and 
f* e {x) = {l + x 2 /2)-\ 

This density satisfies (A2) with 7 = 2, k = 1/2 and \i = S = 0. 
According to Theorem 1, the penalty function, as the variance, is of order 



n 



f 

J — 7T 



<p*{x) 



f*(aL m x) 



dx, where 



/' 

J —IT 



ip*(x) 



f*(aL m x) 



2 / 2 

dx = 2-k 1 + — o- 2 L r 



20 



Some intensive simulation studies on various tested densities lead to choose the following penalty 



pen(L m ) 



6irL r 



1 + 



(ln(L m )) 



2.5 



7T 



Jr 2 , 1" 4 7- 4 



• Case 2: Gaussian e's. In that case, f e (x) = l/y/2ne~ x2 / 2 , and f*(x) = e~ x2 / 2 . 
This density satisfies (A|) with 7 = 0, hq = 1, 5 = 2 and /li = 1/2. 
According to Theorem 1, the penalty, slightly bigger than the variance term, is of order 









f*(aL m x) 



dx where 



<p*{x) 



f e (aL m x) 



dx = exp(a 2 L Ti 

J — TT 



2_2 



x )dx. 



As in the previous case, some intensive simulation studies on various tested densities lead to choose 
the following penalty 



pen(L m ) = 



67ri r 



1 



(ln(£ m )) 2 - 5 | 7r 2 a 2 L r , 



J cxp(<T 2 L m 2 x 2 )dx/ir^J , 



where the integral is numerically computed. According to the theory (see Theorem 1, the loss due 
to the adaptation is the term -K 2 a 2 L m 2 /3. 

The additional term (ln(L m )) 2 ' 5 / L m is motivated by the works of Birge and Rozenholc (2005). In 
our case also, this term improves the quality of the results by making the penalties slightly heavier 
when L m becomes large. 

Note that when a = 0, both penalties are equal to (6nL m )(l + (ln(i m )) 2,5 / L m ) jn. 



5.2 Test densities 



First we consider densities having classical smoothness properties like Holderian smoothness with 
polynomial decay of their Fourier transform. Second we consider densities having stronger smooth- 
ness properties, with exponential decay of the Fourier transform. Except in the case of the infinite 
variance density (Cauchy density), we consider density functions g normalized with unit variance 
so that l/er 2 represents the usual signal-to- noise ratio (variance of the signal divided by the vari- 
ance of the noise) and is denoted in the sequel by s2n defined as s2n = 1/a 2 . The functions which 
are considered are listed below, associated with the interval I used to evaluate the ISE: 

(a) Chi2(3)-type distribution, X = l/y/EU, gx(x) = V&g(V6x), U <~ x 2 (3) where we know that 

and/= [-1,16]. 

(b) Laplace distribution, / = [—5,5]. 

(c) Mixed Gamma distribution, X = 1/V$A8W with W - 0.4r(5, 1) + 0.6r(13, 1), 
and I = [-1.5,26]. 

(d) Cauchy distribution, g{x) = (1/tt)(1/(1 + x 2 )), g*(x) = e^ x \ I = [-10, 10]. 

(e) Gaussian distribution, X ~ A/"(0, a 2 ) with a = 1, I = [—4, 4]. 

(f) Mixed Gaussian distribution: X - ^2V with V - 0.5A/"(-3, 1) + 0.5A/"(2, 1) and I = [-8, 7]. 
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L = 1 



- 


f. 














L 3 = 3 



lO 15 20 



IO 15 



10 15 



Figure 1: Plots of the estimator (dotted line) and of the true x 2 (3) (a) density (full line) - Laplace 
errors - n = 750, s2n=10, when L\ — 1 (left), L2 = 2 (middle), L3 = 3 (right). The algorithm 
chooses rh = L™ = 2. 



Densities (a), (b), (c) correspond to cases with r — 0, whereas densities (d), (e), (f) correspond 
to cases with r > 0. 



5.3 Results 

Figure n compares the estimators c/m obtained for m = L m = 1,2 and 3, and justifies the good 
choice rh — 2 of the algorithm. Table El presents the MISE for the two types of errors, the different 
tested densities, different s2n and for different sample sizes. The greatest values of s2n amount to 
consider that there is essentially no noise. Clearly the MISE are smaller when there is less noise 
(<r small, s2n large). 

We can in particular compare the performances of our adaptive estimator with the performances 
of the deconvolution kernel as presented in Delaigle and Gijbels (2004a). This comparison is done 
for densities (a), (c), (e) and (f) which correspond to the densities #2, #6, #1 and #3 respectively, 
in Delaigle and Gijbels (2004a). They give median ISE obtained with kernel estimators by using 
four different methods of bandwidth selection. The comparison is given in Table [3] between the 
median ISE computed for 500 samples generated with the same interval length and signal to noise 
ratio as Delaigle and Gijbels (2004a). The ISE are computed on the same intervals I as them. We 
also give our corresponding means since we believe that they are more meaningful than medians 
since the MISE is E||^ l) - ff || 2 , but we also give our medians. 

We can see that our estimation procedure provides better results in all cases except in one case, 
namely when we aim at estimating a Gaussian density, for both types of errors density. This is the 
most probably due to the fact that the bandwidth selection methods are based on computations 
assuming that the underlying density is Gaussian, so that they perform very well when it is true. 
For the other cases, even our means are often better than Delaigle and Gijbels' (2004a) medians 
which shows that our method provides a very good solution to the deconvolution problem. 

A standard objection to deconvolution methods is that they require the knowledge of the noise 
density. Therefore, following the ideas of Meister (2004), we study here the properties of the 
estimator when the error density is not correctly specified. For both type of errors, we study the 
behavior of the estimator using one type of the error density when the other type of errors density 
is the good one. Table 0] presents the ratio between the resulting MISE if the errors density is 
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Table 2: Mean MISE xlOO obtained with N = 500 samples, for sample sizes n = 
100, 250, 500, 1000, 2500 and s2n = 2, 4, 10, 100, 1000, the higher s2n the lower the noise level. 
Densities (a): Chi2(3), (b): Laplace, (c): Mixed Gamma, (d): Cauchy, (e) Gaussian, (f): Mixed 
Gaussian. 



xlO 


— 'Z 


n = 


100 


n = 


250 


n = 


500 


n = 


1000 


n = 


2500 


9 


s2n 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 




2 


2.02 


4.15 


1.39 


2.37 


1.18 


1.72 


1.06 


1.36 


1.03 


1.12 




4 


1.52 


1.79 


1.21 


1.27 


1.07 


1.13 


1.04 


1.04 


0.654 


0.996 


(a) 


10 


1.31 


1.31 


1.13 


1.11 


1.01 


1.03 


0.505 


0.995 


0.345 


0.974 




10 2 


1.22 


1.23 


0.72 


0.884 


0.409 


0.411 


0.327 


0.335 


0.179 


0.232 




10 3 


1.22 


1.21 


0.651 


0.638 


0.391 


0.382 


0.293 


0.298 


0.157 


0.157 




2 


3.7 


10.6 


2.17 


5.2 


1.61 


3.03 


1.41 


2.07 


1.2 


1.48 




4 


2.5 


2.99 


1.66 


1.93 


1.33 


1.46 


1.26 


1.25 


0.817 


1.12 


(b) 


10 


1.9 


1.97 


1.43 


1.42 


1.35 


1.22 


0.723 


1.12 


0.441 


1.06 




10 2 


1.69 


1.64 


0.883 


1.06 


0.607 


0.538 


0.453 


0.385 


0.343 


0.211 




10 3 


1.68 


1.65 


0.814 


0.79 


0.593 


0.561 


0.411 


0.379 


0.284 


0.24 




2 


1.32 


3.96 


0.547 


1.88 


0.292 


1.01 


0.148 


0.533 


0.06 


0.224 




4 


0.79 


1.05 


0.316 


0.453 


0.151 


0.224 


0.0815 


0.116 


0.0361 


0.0497 


(c) 


10 


0.495 


0.524 


0.194 


0.215 


0.103 


0.11 


0.0543 


0.0565 


0.024 


0.0246 




10 2 


0.369 


0.384 


0.152 


0.149 


0.0789 


0.0785 


0.0409 


0.0412 


0.0194 


0.0186 




10 3 


0.364 


0.353 


0.149 


0.15 


0.0762 


0.0767 


0.0404 


0.0406 


0.0184 


0.0185 




2 


2.72 


9.09 


1.22 


4.26 


0.645 


2.3 


0.353 


1.25 


0.158 


0.513 




4 


1.66 


2.27 


0.716 


0.967 


0.364 


0.514 


0.205 


0.28 


0.138 


0.127 


(d) 


10 


1.15 


1.13 


0.437 


0.46 


0.249 


0.257 


0.215 


0.142 


0.219 


0.0764 




10 2 


0.815 


0.783 


0.373 


0.351 


0.351 


0.271 


0.206 


0.201 


0.147 


0.0962 




10 3 


0.783 


0.78 


0.366 


0.355 


0.34 


0.331 


0.189 


0.189 


0.121 


0.118 




2 


2.74 


9.21 


1.1 


4.08 


0.605 


2.14 


0.296 


1.06 


0.143 


0.446 




4 


1.59 


2.23 


0.591 


0.878 


0.362 


0.457 


0.229 


0.227 


0.463 


0.0894 


(e) 


10 


0.885 


1.02 


0.397 


0.42 


0.372 


0.21 


0.515 


0.112 


0.229 


0.046 




10 2 


0.711 


0.713 


0.565 


0.432 


0.396 


0.394 


0.279 


0.195 


0.171 


0.15 




10 3 


0.739 


0.705 


0.606 


0.592 


0.352 


0.355 


0.259 


0.246 


0.167 


0.145 




2 


2.97 


9.98 


1.26 


4.45 


0.693 


2.31 


0.328 


1.26 


0.132 


0.509 




4 


1.73 


2.37 


0.709 


1.02 


0.375 


0.478 


0.185 


0.257 


0.0751 


0.105 


(f) 


10 


1.14 


1.21 


0.463 


0.466 


0.237 


0.242 


0.118 


0.122 


0.0468 


0.0515 




10 2 


0.851 


0.817 


0.359 


0.352 


0.166 


0.167 


0.0866 


0.0867 


0.034 


0.0351 




10 3 


0.823 


0.828 


0.344 


0.327 


0.169 


0.163 


0.0845 


0.0839 


0.0334 


0.0336 
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not correct with the MISE if the errors density is correct. For instance, in the columns "e Lap." 
the noise density is Laplace but the MISE in the numerator of the ratio corresponds to estimators 
constructed as if it were Gaussian. As expected, since the construction uses the knowledge of the 
error density, if it is misspccified, the estimator presents some bias and the MISE becomes slightly 
bigger. Nevertheless, this difference does not clearly appear when n is not very large. Indeed in that 
case, the optimal length L m is small and therefore the variance term of order J^ Lm \ f*(x)\~ 2 dx is 
not so different between the two errors. 

Table 3: Median ISE obtained by Dclaigle and Gijbels (2004a) with a kernel estimator and four 
different strategies of bandwidth selection, and with our penalized projection estimator (median 
and mean). 







n = 


100 


n — 


250 


density g 


method 


e Lap. 


£ Gaus. 


e Lap. 


£ Gaus. 


(a) or #2 

X 2 (3) 
(s2n=4) 


DG, lower median 
DG, higher median 


0.015 
0.018 


0.018 
0.022 






Proj.: median 
Proj.: mean 


0.014 
0.015 


0.016 
0.018 






(c) or #6 
Mix. Gamma 
(s2n=10) 


DG, lower median 
DG, higher median 






0.0021 
0.0024 


0.0023 
0.0026 


Proj.: median 
Proj., mean 






0.0017 
0.0019 


0.0020 
0.0021 


(e) or #1 
Gauss 
(s2n=4) 


DG, lower median 
DG, higher median 


0.0071 
0.011 


0.0080 
0.012 


0.0041 
0.0059 


0.0051 
0.0072 


Proj.: median 
Proj.: mean 


0.012 
0.016 


0.017 
0.022 


0.0049 
0.0059 


0.0066 
0.0088 


(f) or #3 
Mix. Gauss 
(s2n=4) 


DG, lower median 
DG, higher median 


0.018 
0.031 


0.027 
0.034 


0.011 
0.023 


0.020 
0.028 


Proj.: median 
Proj.: mean 


0.016 
0.017 


0.022 
0.024 


0.0063 
0.0071 


0.0088 
0.010 



Table 4: Ratio between MISE with misspecified error density (Laplace errors, g estimated as if 
errors were Gaussian and reciprocally) and MISE with correctly specified error density. 



xl0~ 2 




n = 


1000 


n = 


5000 


n = 


10000 


n = 


25000 


9 


s2n 


£ Lap. 


£ Gaus. 


£ Lap. 


£ Gaus. 


£ Lap. 


£ Gaus. 


£ Lap. 


£ Gaus. 


Lapl. 


2 


1.6 


1.4 


2.2 


1.8 


2.3 


2.9 


2.4 


4.5 




4 


1 


1.3 


1 


1.9 


1 


2.2 


1 


2.3 


Mix. Gam. 


2 


1 


1.1 


1.3 


1.6 


1.6 


2.1 


2.2 


3 




4 


1 


1 


1.1 


1.2 


1 


1.3 


1.1 


1.5 


Cauchy 


2 


1.3 


1.3 


1.7 


1.6 


2.5 


1.2 


3.7 


1.5 




4 


1.1 


1 


1.2 


1.1 


1.3 


1.1 


1.4 


1.2 


Gauss 


2 


1.1 


1.4 


1.4 


1.1 


2 


1 


3.1 


1.2 




4 


1 


0.81 


1.2 


1 


1.2 


1 


1.8 


1.3 



Concluding remarks : Our estimation procedure provides an adaptive estimator which achieves 
the minimax rate of convergence (up to a possible logarithmic factor) in all the cases where lower 
bounds are available, without any prior smoothness knowledge on the unknown density g. In 
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particular it solves almost in the best way the bias-variance problem when the best compromise 
would not be easily computable. Furthermore, this estimation procedure induces a fast practical 
algorithm with pretty good practical results. 

6. PROOFS 

6. 1 Proof of Proposition 1 . 

According to (JTJ, for any given m belonging to M, n , $n satisfies, J n (9m ) — ln{9rn) < 0. Denoting 
by v n {t) the centered empirical process 

1 - 

fn(*) = -yV(£i) -<*,$>], (13) 
i=l 

we have that 

7n(<) ~ 7n(«) = II* - 9\\ 2 - ||s - 9\\ 2 - 2i/ B (t - s), (14) 

and therefore, ||g - i?™" 1 !! 2 < ||g - 5m" 1 !! 2 + 2^„(gm ) - flf^)< Since a m j - a mj = v n ((p m ,j), we get 
that 

\j\<K n \j\<K n 

and consequently E||g — gm^W 2 < \\g — gm^\\ 2 + 2 J2jez, Var[^ n (Vm,j)]- Now, since the Xj's and the 
ffi's are independent and identically distributed random variables, we get that Var[r/„(< / 9 mj )] = 

ELi Var [u* VmJ (Z,)] = n^Var , (Z x ) . 

Apply Lemma 2 to get that X^gz Var[f„(^3 Tn j)] < Ai(m)/n, where Ai(m) is defined in Propo- 
sition 1. It remains to study \\g — gm\\ 2 - By applying Pythagoras Theorem, we have \\g — grn > \\ 2 =|| 

g - 9m II 2 + 11.9™ ~ SmH 2 , where || ffm - g^ } \\ 2 = Y,\ 1 \ >Kn a m,j ^ ( su Pj J a m,i) 2 E|j|>ir n .T 2 - Now 
we write that 



ja m ,j = 3yL m J tp(L m x - j)g(x)dx 

< l?J 2 J \x\\tp(L m x - j)\g(x)dx + \/L^ J \L m x - j\\<p(L m x - j)\g(x)dx 

4( 2 (^J W{L m x - j)\ 2 dx^j (^J x 2 g 2 (x)dx^j + Vi^sup \x(p(x)\. 



< Li 



This implies finally that ja m j < L m (M2) 1 ^ 2 + y/L m , and Proposition 1 follows. □ 
6.2 Proof of Theorem 1 

By definition, g satisfies that for all m 6 M n , ln(g) +pen(m) < 7„(gm' ) ) +pen(m). Therefore, by 
applying ltT4*|) we get that 

II 9 ~ g II 2 < II - g II 2 +2v n {g - g^) + pen(m) - pen(m). 

Next, we use that if t = t± +t2 with t\ in sffl and ti in Sfc 5 then t is such that i* has its support in 
|-Timax(m,m'),fim M (m,m')] and therefore t belongs to 5^x(m,m')- If we denote by B m , m /(0, 1) the 
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set B m , m ,(0, 1) = {t e S^l im ml) I \\t\\ = 1}, then \u n ($-g£>)\ < \\g-g^\\ sup t6flm A(0)1) \v n (t)\. 
Consequently, by using that 2uv < a _1 u 2 + av 2 , for a > 1, we get 

\\~9-g\\ 2 < \\9% ) -g\\ 2 + a- 1 \\g-gfc ) \\ !i + a sup i£(t) + pen(m) - pen(m) 

and therefore, by writing that \\g ~ g-m \\ 2 < (1 + — g\\ 2 + (1 + y)||s — 3m^|| 2 , with y = 

(a + l)/(a — 1) for a > 1, we infer that 

||ff-5l| 2 <f^ ± |) 2 ||.9-^ ) H 2 + ^ ± ^ sup ^) + ^±I( P en(m)- P en(m)). 

Choose some positive function p(m,m r ) such that ap(m,m') < pen(m) + pen(m'). Consequently, 
for K a = (a + l)/(a — 1) we have 

\\g-g\\ 2 < kI [\\g ~ g m \\ 2 + \\g m - g^f + pen(m)] + aK a W n (m) 

with W n {m'):=[ sup K(i)| 2 - p(m, m')]+, (15) 

i£B mm ,(0,l) 

that is, according to the proof of Proposition 1, 

\\g-g\\ 2 < K 2 a \\g~g m \\ 2 + K 2 a (M 2 + l){TTL m ) 2 /K n + 2 Ka pen(m)+a Ka ^ W n (m'). (16) 

m'eMn 

The main point of the proof lies in studying W n {m!), and more precisely in finding p(m, to') such 
that for a constant K, 

E(W„(m')) < K/n. (17) 

m'&Mn 

In this case, combining Ijl6|l and (|17|l we infer that, for all m in A4 n , 

E||fl - ffll 2 < Kails ~ fmll 2 + K a(^2 + 1) {wL m ) 2 / K n + 2K a pen(m) + an a K/n, 
which can also be written 

E||0 - .9|| 2 < C a inf - g m \\ 2 + pen(m) + {M 2 + \){TiL m ) 2 / K n ] + an a K/n, 

where C a — max(/-c 2 , 2K a ) suits. It remains thus to find p(m,m') such that l|17|) holds. This will 
be done by applying the following immediate integration of Talagrand's Inequality (see Talagrand 
(1996)): 

Lemma 1. Let Y\, . . . , Y n be i.i.d. random variables and r n (f) = (1/n) Y11=i[.f O^i) ~ for 
/ belonging to a countable class T of uniformly bounded measurable functions. Then for £ 2 > 



E 



sup|r„(/)| 2 -2(l + 2f )H 



2\ jjl 

-I + 



1 *^ + K 1 n*cke) e " "'J' (18) 



with C(£) = yl + £ 2 — 1) ^i is a universal constant, and where 

sup H/IU < Mi, E[su P |r„(/)|] < H, sup Var(/(Fi)) < «. 
/e^ /e.F /e.F 
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Usual density arguments show that this result can be applied to the class of functions T = 
B m ,m'{0, 1)- Let us denote by m* = max(m, m'). Combining Lemma 3 and Lemma 4, we propose 

to take 

H 2 = H 2 {m*) = AiL^, +1 " 5 exp{2na 5 (irL m *) s }/n and Mi = V^ff 2 , 

where Ai = Ai(7, kq, /i, a, 5) is defined by Q. Again, by applying Lemma 4, we take v > A2(m*, h) 
with 

(p* (x)<p*(y) 



A 2 (m, h) = L; 



f*(aL m x)f*(aL m y) 



h*{L m (x - y)) 



dxdy. (19) 



For 8 > 1 we use a rough bound for A2(m, h) given by v /A 2 (m*,/i) < 2nnH 2 . When <5 < 1, write 
that 



dx f 

\f* {aLmX) \2 J 

< 2K Q 2 TTX 1 (l + a 2 7r 2 y\\h*\\ 2 L^+ 1 - 5 cxp{^a s {TrL m ) 5 }. 



\h*(L m u)\ 2 du 



Using that \\h*\\ 2 < ||/ e *|| 2 < oo, we take v = A 2 i^ +min(1/2 " 5/2 ' 1_5) cxp{2 A1( r 5 (^i m .) 5 }, where 
A 2 = A 2 (7, kq, /i, cr, <5) is defined in Theorem 1. From the definition p5|) of W n {m'), by taking 
p{m,m') = 2(1 + 2£ 2 )iJ 2 , we get that 

E(W„(m'))<E[ sup \v n {t)\ 2 -2(l + 2£ 2 )ff 2 ] + . 

f e s m , m '(o,i) 

By applying l(T%]l. we get the global bound E(W n (L m ')) < if[7(L m *) +/J(m*)], where I(m*) and 
II(m*) are defined by 

, r 27+min(l/2-<5/2,l-<5) r„ 5/ r s^i 

r / *n A 2i m . exp{2/icr (7rL m «) } 2 r (i/2-«/2) +1 

J(m ) = expj-Ki^ (Ai/A 2 )i^i ;+ j 

n 



and 7J(m*) = ^— ^ = exp \ —K 1 £C(£)\/n/\ / 2 ! 



with A 2 = A 2 (7, «o, /U, u, <5) defined in Theorem 1. 

. Study of T,m'eM n n ( m *)- Wehave£ m , eMn IJ(TO*) < |A^ n |exp{-^C(0^/\/2} 2A 1 r(m„)/n 2 , 
according to the choices for v, H 2 and M\. Consequently, since under r(m„)/n is bounded, 
E m ' eMn ^(m*)<C/n. 

• Study of E ro '6A4„ 7 ( m *)- Denote by V = 2 7 + min(l/2 - 5/2,1 - 6), w = (1/2 - 5/2)+, 
_ftT' = i^iAi/A 2 , then for a,b > 1, we infer that 

max(a,6)' / 'e 2M ' T ' 57r ' 5max(Q ^ ) ' 5 e _ - R "' 52max(a ' fcr < {a 4 'e 2 ^ V 5 ° 4 + 5^ e W^f> 5 ) e -(^'« 2 /2)(a"+6") 

< a V' e 2 A4 <7V 5 a 5 e -(K'e 2 /2)a w e -(i ; f'S 2 /2)6 w + ^^^^b* e -(K'f/2)b" _ ^0) 

Consequently, if we denote by f the quantity f (m) = L^" 1 " 1 ' 1 ' 2 <5 ^ 2 ' 1 ^ exp{2/j,cr l5 (7ri m ) 5 } then 

y; /(m *) < ?MMexp{-(ir'c 2 /2)(L ro )(^-v% } y exp{-(i^ 2 /2)(L m ,) (1/2 - 5/2)+ } 



£ ^£^exp{-(^ 2 /2)(^) (1/2 - 5/2)+ }. (21) 



17 



1) Case < 5 < 1/3 In that case, since S < (1/2 — 8/2) + , the choice £ 2 = 1 ensures that 
f(m) exp{-( J ft:'£ 2 / 2 )(£m) (1/2 ~ ,5/2) } is bounded and thus the first term in is bounded by 
C/n. Since 1 < m < m„ with m n satisfying ®, E m 'eM„ (f ( m ')/ n ) exp{-(iT/2)(L m (1/2 ~' 5/2) } 

is bounded by C/n, and hence J^m'eAl I{m*) < C/n. Consequently ((T7|) hold if we choose 
pen(m) = 2a(l + 2£ 2 )Ai (i m ) 27+1 " <5 exp-j^cr' 5 ^™)' 5 }/™- 

2) Case <5 = 1/3 According to the inequality gQJ, £ 2 is such that 2fia 5 n s (L m -,) s -(K'^ 2 /2)L 5 m , = 
—2ntj s (TrL m *) 5 that is £ 2 = (4:/j,a 5 Tr s X2)/ (Ki\i). Arguing as for the case < S < 1/3, this choice 
ensures that Y^m'eM H m *) < C/n, and consequently ((T7|l holds. The result follows forp(m, m!) = 
2(l+2^ 2 )Ai J L 2 7, +1 ~ 5 exp(2^cr' 5 (7r J L m .)' 5 )/«>andpen(m) = 2a{l+2£, 2 )\ l L m 2l+1 ~ s exp(2^a s {7rL m ) s )/n. 

3) Case 5 > 1/3 If <5 > (1/2 - 5/2) + , according to lO we choose £ 2 = £ 2 (L m , L m <) such that 
2^7T 5 (£ m .) 5 -(^ 2 /2)^. = -2/i(7 4 7r 4 (L m .) 4 thatis^ 2 = £ 2 (m,m') = (Afia^ 6 \ 2 )/{K 1 X 1 )L s m , UJ . 
This choice ensures that J2 m 'eM n I{ m *) < C/ n ; an d consequently l(T7l) holds if p(m,m') = 
2(1 + 2£ 2 (to, to'))AiL 2 J, +1_5 exp(2/icr l5 (7rL m »)' 5 )/n, associated to the penalty pen(m) = 2a(l + 
2e{L m ,m))\ 1 {L m ) 2 ^+ 1 - s cM^<J S (TTL m ) s )/n. □ 

6.3 Technical Lemmas 

Lemma 2. Let v n {t) be defined by H3\) . Ai(m) 6e defined in Proposition 1. Under Assumptions 
f(Af' £ ) 

II XlKm,/ ll~< Ai(m), and sup ^ Varhv n (y> roJ )] < A t (m)/n. (22) 



Proof of Lemma 2 Use the definition of u* m . (z) to get that 

1 2 f L f 

^2\ u v> m , j ( z ) = X! exp{ixz}u Vmj (x)dx =--^^ eicp{-ixzL m } exp{ijx} 



By Parseval's Formula, 



f*(xL m a 



■dx 



f*(xL m a) 



dx = Ai(m), 



(23) 



which entails that the first part of the bound is proved. The second part follows since 



3& 



,{z) h(z)dz. □ 



Lemma 3. Let Ai(m) and i?(/x, (5, <r) &e defined in Proposition 1 and in ^j). TTien under the 
assumption (A|), Ai(m) < 



(.LJ 1 ^^ 2 !^ 2 + 1)T exp^crVL^} 



•kKqR{pl, 8, a) 



Proof of Lemma 3. Under the assumption (A|), Ai(m) < (jiK^)~ 1 {a 2 L 2 m -K 2 + J^ Lm exp{2^cr 5 u 5 
If <5 = 0, by convention /1 = 0, and hence the integral in the previous bound is less than wL m . 
Consider now the case < S < 1. Easy calculations provide that 

e 2 ^du = r m (2^6u^e 2 ^) , < [^'1^ 

7 Jo ^ ' 2fia 5 5u 5 ~ 1 ~ 2^a 6 6 L Jo 
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and therefore J* m exp{2^a 5 u 5 }du < [(7rL m ) 1 "' 5 / (2[ia s 5)} exp(2/icr <5 (irL m a) s ). 

Now, if 5 > 1, then by using that u* 5 = u* 5-1 ?/, and consequently Lemma 3 follows from 

/ exp{2^crV}dw < / exp{2na s (TrL m ) s - 1 u}du < V n m; A exp(2^a s (wL m ) s ).a 
Jo Jo 2 M°" 

Lemma 4. Lei v n (t), Ai(m) and A2 (m,/i) &e defined in Help . Proposition 1 and in The 
under (Aj ,e ) 

sup || u* ||oo< VAi(m*) E[ sup |i/ n (t)|] < y/A^m*)/*,, 

tSS mm ,(0,l) i£B mm ; (0,1) 



and sup Var(u*(Zi)) < y/A 2 (m*, ft)/(27r). 
tes m , m /(o,i) 



Proof of Lemma 4 By combining Cauchy-Schwarz Inequality and 1)23(1 . the square of the first 
term sup teBm m , (oa) || u t * ||^ is bounded by Ejez / km« ,j( u )// s *(°"")| du = A i( m *)- Now, we 

r 1 r i 1/2 

have E[sup teBm m , (0)1) \v n (t)\] < E (E^zCM^m*,;/)) 2 ) 172 < Ejez Var(y n (vm*,j)) , which 
is bounded, by applying the second part of (|22ll in Lemma 2, by ^/ A\{m*)/n. Now write that 
s UPt G B m m , (0,1) VarK(^i)) < Bup t6Bm , (0il) E[|uJ(Zi)| 2 ] < E 3 , fce z I Q^fc ) 1 2 ] 1/2 > with M 
E[u* mii (Zi)«* mik (-Zi)] also given by' 



Qj,k{m) = / / exp{r/a; - zfcy} 7w 7 ^wwH rh*{L rn (x - y))dxdy. 

{^rJJ f*(aL m x)f*(aL m y) 



Apply Parseval's Formula to get the result since 



j,kez y i j j 



f*{oL m x)f*(o-L m y) 



h*(L m (x - y)) 



dxdy.U 
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